Audio Scene Understanding using Topic Models

نویسندگان

  • Samuel Kim
  • Shiva Sundaram
  • Panayiotis Georgiou
  • Shrikanth Narayanan
چکیده

This paper introduces a method to apply the topic models in an audio scene understanding framework. Assuming that an audio signal consists of latent topics that generate acoustic words describing an audio scene, we propose to use a vector quantization method to build an acoustic word dictionary. The classification experiments with semantic labels yield promising results of using the topic models, compared to the conventional GMM-based approach, in audio scene understanding tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Traffic Scene Analysis using Hierarchical Sparse Topical Coding

Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...

متن کامل

Large Margin Learning of Upstream Scene Understanding Models

Upstream supervised topic models have been widely used for complicated scene understanding. However, existing maximum likelihood estimation (MLE) schemes can make the prediction model learning independent of latent topic discovery and result in an imbalanced prediction rule for scene classification. This paper presents a joint max-margin and max-likelihood learning method for upstream scene und...

متن کامل

Audio scene segmentation using multiple features, models and time scales

In this paper we present an algorithm for audio scene segmentation. An audio scene is a semantically consistent sound segment that is characterized by a few dominant sources of sound. A scene change occurs when a majority of the sources present in the data change. Our segmentation framework has three parts: (a) A definition of an audio scene (b) multiple feature models that characterize the dom...

متن کامل

Scene Understanding through Audio-Visual Fusion

Scene understanding involves the integration of a wide variety of information to produce a through description of the robot's environment. By integrating spatial, visual and audio cues, we could provide a greater amount of understanding than can be obtained using one of the modalities alone. In this paper, we describe our current work on using audition to enhance existing object detection and t...

متن کامل

Emotional Personae and Directorial Modeling for Interactive Entertainment

information about objects in AVT varies in granularity from scene headers describing information that is globally constant for a set of frames, to frame headers, information that is local to a single frame. Figure 1 depicts the scene header information for the video objects available to the rule base. Along with traditional "file control block" information, embedded in each object is informatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009